13 research outputs found

    Tight Bounds for Local Glivenko-Cantelli

    Full text link
    This paper addresses the statistical problem of estimating the infinite-norm deviation from the empirical mean to the distribution mean for high-dimensional distributions on {0,1}d\{0,1\}^d, potentially with d=d=\infty. Unlike traditional bounds as in the classical Glivenko-Cantelli theorem, we explore the instance-dependent convergence behavior. For product distributions, we provide the exact non-asymptotic behavior of the expected maximum deviation, revealing various regimes of decay. In particular, these tight bounds demonstrate the necessity of a previously proposed factor for an upper bound, answering a corresponding COLT 2023 open problem. We also consider general distributions on {0,1}d\{0,1\}^d and provide the tightest possible bounds for the maximum deviation of the empirical mean given only the mean statistic. Along the way, we prove a localized version of the Dvoretzky-Kiefer-Wolfowitz inequality. Additionally, we present some results for two other cases, one where the deviation is measured in some qq-norm, and the other where the distribution is supported on a continuous domain [0,1]d[0,1]^d, and also provide some high-probability bounds for the maximum deviation in the independent Bernoulli case.Comment: ALT 202

    Probabilistic bounds on the kk-Traveling Salesman Problem and the Traveling Repairman Problem

    Full text link
    The kk-traveling salesman problem (kk-TSP) seeks a tour of minimal length that visits a subset of knk\leq n points. The traveling repairman problem (TRP) seeks a complete tour with minimal latency. This paper provides constant-factor probabilistic approximations of both problems. We first show that the optimal length of the kk-TSP path grows at a rate of Θ(k/n12(1+1k1))\Theta\left(k/n^{\frac{1}{2}\left(1+\frac{1}{k-1}\right)}\right). The proof provides a constant-factor approximation scheme, which solves a TSP in a high-concentration zone -- leveraging large deviations of local concentrations. Then, we show that the optimal TRP latency grows at a rate of Θ(nn)\Theta(n\sqrt n). This result extends the classical Beardwood-Halton-Hammersley theorem to the TRP. Again, the proof provides a constant-factor approximation scheme, which visits zones by decreasing order of probability density. We discuss practical implications of this result in the design of transportation and logistics systems. Finally, we propose dedicated notions of fairness -- randomized population-based fairness for the kk-TSP and geographical fairness for the TRP -- and give algorithms to balance efficiency and fairness

    Memory-Constrained Algorithms for Convex Optimization via Recursive Cutting-Planes

    Full text link
    We propose a family of recursive cutting-plane algorithms to solve feasibility problems with constrained memory, which can also be used for first-order convex optimization. Precisely, in order to find a point within a ball of radius ϵ\epsilon with a separation oracle in dimension dd -- or to minimize 11-Lipschitz convex functions to accuracy ϵ\epsilon over the unit ball -- our algorithms use O(d2pln1ϵ)\mathcal O(\frac{d^2}{p}\ln \frac{1}{\epsilon}) bits of memory, and make O((Cdpln1ϵ)p)\mathcal O((C\frac{d}{p}\ln \frac{1}{\epsilon})^p) oracle calls, for some universal constant C1C \geq 1. The family is parametrized by p[d]p\in[d] and provides an oracle-complexity/memory trade-off in the sub-polynomial regime ln1ϵlnd\ln\frac{1}{\epsilon}\gg\ln d. While several works gave lower-bound trade-offs (impossibility results) -- we explicit here their dependence with ln1ϵ\ln\frac{1}{\epsilon}, showing that these also hold in any sub-polynomial regime -- to the best of our knowledge this is the first class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with ϵ1/d\epsilon\leq 1/\sqrt d. The algorithms divide the dd variables into pp blocks and optimize over blocks sequentially, with approximate separation vectors constructed using a variant of Vaidya's method. In the regime ϵdΩ(d)\epsilon \leq d^{-\Omega(d)}, our algorithm with p=dp=d achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent

    Additional Results and Extensions for the paper "Probabilistic bounds on the kk-Traveling Salesman Problem and the Traveling Repairman Problem''

    Full text link
    This technical report provides additional results for the main paper ``Probabilistic bounds on the kk-Traveling Salesman Problem (kk-TSP) and the Traveling Repairman Problem (TRP)''. For the kk-TSP, we extend the probabilistic bounds derived in the main paper to the case of distributions with general densities. For the TRP, we propose a utility-based notion of fairness and derive constant-factor probabilistic bounds for this objective, thus extending the TRP bounds from the main paper to non-linear utilities

    On the Length of Monotone Paths in Polyhedra

    Full text link
    Motivated by the problem of bounding the number of iterations of the Simplex algorithm we investigate the possible lengths of monotone paths followed by the Simplex method inside the oriented graphs of polyhedra (oriented by the objective function). We consider both the shortest and the longest monotone paths and estimate the monotone diameter and height of polyhedra. Our analysis applies to transportation polytopes, matroid polytopes, matching polytopes, shortest-path polytopes, and the TSP, among others. We begin by showing that combinatorial cubes have monotone and Bland pivot height bounded by their dimension and that in fact all monotone paths of zonotopes are no larger than the number of edge directions of the zonotope. We later use this to show that several polytopes have polynomial-size pivot height, for all pivot rules. In contrast, we show that many well-known combinatorial polytopes have exponentially-long monotone paths. Surprisingly, for some famous pivot rules, e.g., greatest improvement and steepest edge, these same polytopes have polynomial-size simplex paths.Comment: 24 pages, 8 figure

    On the Length of Monotone Paths in Polyhedra

    No full text

    Universal Online Learning with Bounded Loss: Reduction to Binary Classification

    Full text link
    We study universal consistency of non-i.i.d. processes in the context of online learning. A stochastic process is said to admit universal consistency if there exists a learner that achieves vanishing average loss for any measurable response function on this process. When the loss function is unbounded, Blanchard et al. showed that the only processes admitting strong universal consistency are those taking a finite number of values almost surely. However, when the loss function is bounded, the class of processes admitting strong universal consistency is much richer and its characterization could be dependent on the response setting (Hanneke). In this paper, we show that this class of processes is independent from the response setting thereby closing an open question (Hanneke, Open Problem 3). Specifically, we show that the class of processes that admit universal online learning is the same for binary classification as for multiclass classification with countable number of classes. Consequently, any output setting with bounded loss can be reduced to binary classification. Our reduction is constructive and practical. Indeed, we show that the nearest neighbor algorithm is transported by our construction. For binary classification on a process admitting strong universal learning, we prove that nearest neighbor successfully learns at least all finite unions of intervals

    Evaluation of Cellular Responses for the Diagnosis of Allergic Bronchopulmonary Mycosis: A Preliminary Study in Cystic Fibrosis Patients

    No full text
    International audienceBackground: Allergic bronchopulmonary mycosis (ABPM) is an underestimated allergic disease due to fungi. Most reported cases are caused by Aspergillus fumigatus (Af) and are referred to as allergic bronchopulmonary aspergillosis (ABPA). The main risk factor of ABPA is a history of lung disease, such as cystic fibrosis, asthma, or chronic obstructive pulmonary disease. The main diagnostic criteria for ABPA rely on the evaluation of humoral IgE and IgG responses to Af extracts, although these cannot discriminate Af sensitization and ABPA. Moreover, fungi other than Af have been incriminated. Flow cytometric evaluation of functional responses of basophils and lymphocytes in the context of allergic diseases is gaining momentum. Objectives: We hypothesized that the detection of functional responses through basophil and lymphocyte activation tests might be useful for ABPM diagnosis. We present here the results of a pilot study comparing the performance of these cellular assays vs. usual diagnostic criteria in a cystic fibrosis (CF) cohort. Methods: Ex vivo basophil activation test (BAT) is a diagnostic tool highlighting an immediate hypersensitivity mechanism against an allergen, e.g., through CD63 upregulation as an indirect measure of degranulation. Lymphocyte stimulation test (LST) relies on the upregulation of activation markers, such as CD69, after incubation with allergen(s), to explain delayed hypersensitivity. These assays were performed with Af, Penicillium, and Alternaria extracts in 29 adult CF patients. Results: BAT responses of ABPA patients were higher than those of sensitized or control CF patients. The highest LST result was for a woman who developed ABPA 3 months after the tests, despite the absence of specific IgG and IgE to Af at the time of the initial investigation. Michel et al. Cellular Responses of ABPM Conclusion: We conclude that basophil and lymphocyte activation tests could enhance the diagnosis of allergic mycosis, compared to usual humoral markers. Further studies with larger cohorts and addressing both mold extracts and mold relevant molecules are needed in order to confirm and extend the application of this personalized medicine approach
    corecore